Hierarchical Clustering Procedure for Grouping Orthologous Domains in Multiple Genomes
نویسنده
چکیده
Identification of orthologous genes, which are defined as homologous genes derived from speciation in the history of evolution [1], is a crucial step for comparative analysis of multiple genomes. Basically, evidence to identify orthologs includes: i) highest level of similarity, ii) conservation of gene arrangement on the chromosome, iii) consistency between gene phylogeny and species phylogeny. However, because of very large evolutionary distances as well as complex evolutionary events such as genome rearrangements and horizontal gene transfers, ortholog identification among various microbial genomes is not so trivial task. Many of the previous works (e.g. [2, 5]) mainly relied on the best hit relationships among all-against-all protein sequence comparisons between two genomes, but generally they required several additional procedures such as domain splitting, gene order comparison and phylogenetic consideration. On the other hand, numerous clustering procedures have been developed for homology domain identification (e.g. [4]), but generally they are not suitable for ortholog grouping. Here, I propose a hierarchical clustering approach which is flexible enough to incorporate many of the above ortholog criteria. Especially, the procedure is suitable for splitting genes into homologous domains minimally required for ortholog grouping.
منابع مشابه
Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes
Ortholog identification is a crucial first step in comparative genomics. Here, we present a rapid method of ortholog grouping which is effective enough to allow the comparison of many genomes simultaneously. The method takes as input all-against-all similarity data and classifies genes based on the traditional hierarchical clustering algorithm UPGMA. In the course of clustering, the method dete...
متن کاملCOCO-CL: hierarchical clustering of homology relations based on evolutionary correlations
MOTIVATION Determining orthology relations among genes across multiple genomes is an important problem in the post-genomic era. Identifying orthologous genes can not only help predict functional annotations for newly sequenced or poorly characterized genomes, but can also help predict new protein-protein interactions. Unfortunately, determining orthology relation through computational methods i...
متن کاملComprehensive analysis of orthologous protein domains using the HOPS database.
One of the most reliable methods for protein function annotation is to transfer experimentally known functions from orthologous proteins in other organisms. Most methods for identifying orthologs operate on a subset of organisms with a completely sequenced genome, and treat proteins as single-domain units. However, it is well known that proteins are often made up of several independent domains,...
متن کاملOrthologous Matrix (OMA) algorithm 2.0: more robust to asymmetric evolutionary rates and more scalable hierarchical orthologous group inference
Motivation Accurate orthology inference is a fundamental step in many phylogenetics and comparative analysis. Many methods have been proposed, including OMA (Orthologous MAtrix). Yet substantial challenges remain, in particular in coping with fragmented genes or genes evolving at different rates after duplication, and in scaling to large datasets. With more and more genomes available, it is nec...
متن کاملeggNOG: automated construction and annotation of orthologous groups of genes
The identification of orthologous genes forms the basis for most comparative genomics studies. Existing approaches either lack functional annotation of the identified orthologous groups, hampering the interpretation of subsequent results, or are manually annotated and thus lag behind the rapid sequencing of new genomes. Here we present the eggNOG database ('evolutionary genealogy of genes: Non-...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000